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Coronaviruses can infect a variety of animals including poultry, livestock, and humans and are currently 
classified into three groups. The interspecies transmissions of coronaviruses between different hosts form a 
complex ecosystem of which little is known. The outbreak of severe acute respiratory syndrome (SARS) and the 
recent identification of new coronaviruses have highlighted the necessity for further investigation of corona- 
virus ecology, in particular the role of bats and other wild animals. In this study, we sampled bat populations 
in 15 provinces of China and reveal that approximately 6.5% of the bats, from diverse species distributed 
throughout the region, harbor coronaviruses. Full genomes of four coronavirues from bats were sequenced and 
analyzed. Phylogenetic analyses of the spike, envelope, membrane, and nucleoprotein structural proteins and the 
two conserved replicase domains, putative RNA-dependent RNA polymerase and RNA helicase, revealed that 
bat coronaviruses cluster in three different groups: group 1, another group that includes all SARS and 
SARS-like coronaviruses (putative group 4), and an independent bat coronavirus group (putative group 5). 

Further genetic analyses showed that different species of bats maintain coronaviruses from different groups 
and that a single bat species from different geographic locations supports similar coronaviruses. Thus, the 
findings of this study suggest that bats may play an integral role in the ecology and evolution of coronaviruses. 


Coronaviruses (CoVs) are enveloped, single-stranded, pos¬ 
itive-sense RNA viruses (4). Based on serological and genetic 
features, coronaviruses are classified into three groups (14). 
These viruses infect various species of poultry, livestock, and 
pets and also humans, causing acute and chronic respiratory, 
enteric, hepatic, and central nervous system diseases (45). 
Prior to the outbreak of severe acute respiratory syndrome 
(SARS), most of our knowledge regarding coronaviruses re¬ 
sulted from investigations associated with animal health. As 
such, until recently, the evolutionary and ecological aspects of 
coronavirus had not been extensively studied. 

In March 2003, after the outbreak of SARS, a novel coro¬ 
navirus (SARS-CoV) was identified as the etiologic agent re¬ 
sponsible for human infection (20, 31). The identification of 
SARS-like coronavirus in Himalayan palm civets and raccoon 
dogs in live-animal markets in southern China suggested that 
SARS was a possible zoonosis (10). Further virological surveil¬ 
lance confirmed that the infectious source of SARS was from 
those live-animal markets and confirmed its zoonotic origin 
(11). However, subsequent studies suggested that those market 
animals were intermediate hosts rather than the natural reser¬ 
voirs of SARS-CoV, as extensive surveillance studies did not 
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detect the virus in either farmed or wild animals of the same 
species (17). 

Recent studies have suggested horseshoe bats ( Rhinolophus 
spp.) as possible natural reservoirs of SARS-like coronavirus 
(23, 25). However, genome sequence comparison of the spike 
(S) genes from bat SARS-like coronavirus and civet SARS-like 
coronavirus revealed only 64% genetic homology, suggesting 
that the evolutionary pathway of SARS-CoV remains to be 
fully described. Given the high biodiversity of bats, along with 
significant population size, broad geographical distribution, 
and the ability to migrate and along with the detection of many 
emerging viruses (1, 7), it is reasonable to consider that bats 
may contain the direct progenitor of SARS-CoV. Moreover, a 
growing number of novel coronaviruses have recently been 
identified, such as HCoV-NL63 (42) and HCoV-HKUl (47) 
from humans and some avian infectious bronchitis virus (IBV)- 
like coronaviruses from different avian species (16, 26). These 
accumulated findings suggest that coronaviruses may have a 
much wider distribution in the animal kingdom than previously 
thought. 

To explore the natural distribution of the virus in bat pop¬ 
ulations and also to understand the possible role of bats in 
coronavirus ecology, we conducted a virological surveillance 
study in China. Genetic analysis revealed that bat coronavi¬ 
ruses mainly clustered into three different groups: group 1, a 
group including all SARS and SARS-like coronaviruses from 
different hosts (putative group 4), and an independent bat 
coronavirus group (putative group 5). Further characterization 
of bat coronaviruses revealed high genetic diversity across a 
large geographic distribution and revealed that different spe- 
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FIG. 1. Map of China showing 15 provinces where coronavirus 
surveillance in bats was conducted. Numbers indicate number of sites 
positive over the total number of sites sampled in each province. 


cies of bats maintain coronaviruses from different groups and 
that the same species of bat from different geographic loca¬ 
tions can also contain the same type of coronavirus. Thus, the 
findings of this study suggest that bats may play an integral role 
in the ecology and evolution of coronaviruses. 

MATERIALS AND METHODS 

Sampling. From November 2004 to March 2006, 985 bats from 35 species 
(belonging to 14 genera in three families) were captured and sampled from their 
natural habitats at 82 locations in 15 provinces throughout China (Fig. 1; Table 
1). Bat identification in the field was initially determined by morphology (1, 44), 
and identifications were confirmed by sequence analysis of the mitochondrial 
cytochrome b DNA as previously described (43). Identification of species in the 
genus Myotis could not be readily made, and these specimens were recorded as 
Myotis sp. Most bats were captured in the wild from natural roosts; however, 
some were also sampled from populated areas. Oropharyngeal and anal swabs 
from each bat were taken, placed in transport medium, kept in liquid nitrogen for 
transportation to the laboratory, and then stored at — 80°C. All captured bats 
were released after samples were taken. 

Viral detection and isolation. Viral RNA was extracted from oropharyngeal 
and anal swabs with the QLAamp viral RNA minikit (QIAGEN, Westburg, The 
Netherlands) and used as the template for reverse transcription-PCR (RT-PCR) 
detection of the coronavirus RNA-dependent RNA polymerase (RdRp) gene as 
previously described (10). Primers conserved for all known coronaviruses were 
designed for RT-PCR detection (39). Primer sequences are available upon re¬ 
quest. The RdRp PCR products were gel purified using the QIAquick PCR 
purification kit (QIAGEN) and sequenced to confirm virus identification (see 
below). Virus isolation was attempted using several cell lines (FRHK4, Vero E6, 
and CV1) for five of the PCR-positive samples; however, no cytopathic effect was 
observed in any of the cell lines. PCR detection confirmed that no viruses had 
grown in cell culture. 

As many coronaviruses have been recently identified in different animals from 
different regions, to avoid confusion the nomenclature of bat coronaviruses from 
this study is given in the following format: host, geographic location of sampling, 
sample number, and year, e.g., BtCo V/Rhinolophus fenurnequinurn/HAibei/2131 
2004 (abbreviation, BtCoV/273/04). 

Genome analysis. The nucleotide data obtained from diagnostic sequencing 
of the RdRp fragment were analyzed with available coronavirus sequences in 
GenBank and used to determine the diversity of the detected coronaviruses and 
to select representative strains for full genome sequencing. Four viruses in two 
new coronavirus lineages were selected for complete genome sequencing. 

RNA extraction was done using the viral RNA kit from QIAGEN, and cDNA 
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synthesis was conducted with random hexamer, gene-specific, and oligo(dT) 
primers. Degenerate primers for cDNA amplification and sequencing were de¬ 
signed from multiple alignments of GenBank sequence data using the program 
CODEHOP (35). Conventional PCR using Platinum Taq DNA high-fidelity 
polymerase (Invitrogen) and gene-specific primers was then used for filling gaps 
between the CODEHOP-amplified regions. Shotgun sequencing (38) with the 
Zero Blunt PCR cloning kit (Invitrogen) was conducted for large PCR fragments 
generated from specific primers between the CODEHOP-amplified regions (35). 
For regions that could not be amplified using CODEHOP, we used the method 
of rapid amplification of cDNA ends with second-generation 573' kits for rapid 
amplification of cDNA ends (Roche). Sequencing was performed by using the 
BigDye Terminator version 3.1 cycle sequencing kit on an ABI PRISM 3700 
DNA analyzer (Applied Biosystems) following the manufacturer’s instructions. 
All primer sequences are available upon request. 

The open reading frames (ORFs) of each of the four complete genomes were 
identified and mapped using the program SeqBuilder (Lasergene version 6.1; 
DNAStar, Madison, WI) and confirmed using Z-Curve (8). Homology searches 
of identified ORFs against other known coronaviruses were conducted in the 
GenBank and Pfam databases (2). Protein precursors produced by ORFlab were 
predicted using the program Z-Curve (8). Prediction of transmembrane (TM) 
domains was performed using TMpred (12). 

Sequence similarity. Full-length amino acid alignments of each of the major 
gene products were used to calculate the similarity (p distances) within and 
between the different coronavirus groups, including putative groups 4 and 5, 
using MEG A3 (21). The virus sequences used in this analysis are the same as 
those in the phylogenetic trees. 

Phylogenetic studies. For the structural proteins, spike (S), membrane (M), 
envelope (E), and nucleocapsid (N), only full-length sequences were included in 
the analyses. For the replicase domains, the conserved sequence regions of the 
RdRp and helicase (HEL) were used. Multiple alignments of bat coronaviruses 
with other known coronaviruses were conducted with the programs TransAlign 
(3) and ClustalW (41) and manually optimized with Se-Al (33). Phylogenetic 
trees were constructed using the neighbor-joining criterion with the Jukes-Cantor 
model (JC69) in the programs MEGA3 (21) and PAUP* version 4.0b (40). Gaps 
were treated as missing data in all analyses. 

Recombination analysis. Sliding window analysis was used to detect recombi¬ 
nation within the RdRp, S, and N genes. The same multiple alignments used for 
phylogenetic tree reconstruction, with the outgroup excluded, were analyzed 
using the difference of the sum of squares method (window size, 300, with steps 
of 100 amino acids) in the program Topali (29). The RDP method, as imple¬ 
mented in program RDP version 2 (28), was also used for recombination detec¬ 
tion with the percentage of identity for recombinant sequences set from 0 to 100. 

Nucleotide sequence accession numbers. The sequences reported in this paper 
have been deposited in GenBank under accession numbers DQ648786 to 
DQ648797 and DQ648799 to DQ648858. 


RESULTS 

Surveillance and prevalence. To understand the prevalence 
and distribution of coronavirus in bats, 985 individuals belong¬ 
ing to 35 species, from three bat families, were sampled at 82 
sites in 15 provinces of China (Fig. 1; Table 1). All samples 
were collected from apparently healthy individuals and tested 
for the presence of coronavirus by RT-PCR detection of a 
440-bp RdRp gene fragment. A total of 64 (6.5%) samples 
tested positive in 19 of the 82 sites, located in 12 provinces 
(Fig. 1). Ten of the 35 species tested were found to harbor 
coronavirus; 57 (89%) positive samples were detected from six 
species of the family Vespertilionidae, and the rest were from 
four species of the Rhinolophidae. The Vespertilionidae and 
Rhinolophidae accounted for 48% and 47% of the samples, 
respectively (Table 1). 

Two colonies of bats, from different sampling sites, had 
much higher positive rates than average. One Miniopterus 
schreibersi colony had a 55% (11/20) positive rate, while a 
Pipistrellus abramus colony had a 35% (11/31) positive rate. All 
positive samples were from anal swabs, and none from throat 
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TABLE 1. Coronavirus distribution in different bat species and locations 


Family and species of bat 

Common name 

No. sampled (no. positive) 

Group(s) 

Sites 

Bats 

Rhinolophidae 

Rhinolophus pusillus 

Least horseshoe bat 

27 

101 


Rhinolophus malayanus 

Malayan horseshoe bat 

2 

15 


Rhinolophus affinis 

Intermediate horseshoe bat 

9 

60 


Rhinolophus 

Greater horseshoe bat 

11(3) 

41 (4) 

1, 4, and 5 

ferrumequinum 

Rhinolophus thomasi 

Thomas’s horseshoe bat 

5 

12 


Rhinolophus sinicus 

Chinese horseshoe bat 

16 (1) 

66 (1) 

4 

Rhinolophus pearsoni 

Pearson’s horseshoe bat 

10(1) 

48 (1) 

1 

Rhinolophus macrotis 

Big-eared horseshoe bat 

11(1) 

38 (1) 

4 

Rhinolophus rex 

King horseshoe bat 

2 

2 


Rhinolophus luctus 

Woolly horseshoe bat 

4 

4 


Rhinolophus osgoodi 

Osgood’s horseshoe bat 

1 

1 


Hipposideros armiger 

Great leaf-nosed bat 

13 

58 


Hipposideros laivatus 

Intermediate leaf-nosed bat 

1 

3 


Hipposideros pratti 

Pratt’s leaf-nosed bat 

2 

9 


Hipposideros pomona 

Pomona leaf-nosed bat 

1 

1 


Coelops frithi 

East Asian tailless leaf-nosed bat 

2 

6 


Aselliscus stoliczkanus 

Stoliczka’s Asian trident bat 

1 

7 


Vespertilionidae 

Pipistrellus pipistrellus 

Common pipistrelle 

4(1) 

27(6) 

5 

Pipstrellus abramus 

Japanese pipistrelle 

8(3) 

41 (14) 

5 

Scotophilus kulilii 

Lesser Asiatic yellow house bat 

2(1) 

43 (5) 

1 

Myotis daubentonii 

Daubenton’s bat 

4 

41 


Myotis mystacinus 

Whiskered bat 

1 

1 


Myotis ricketti 

Rickett’s big-footed bat 

8(4) 

53 (13) 

1 

Myotis chinensis 

Large Myotis 

2 

3 


Myotis sp." 

Nyctalus aviator 

Birdlike noctule 

9 

2 

80 

6 


Nyctalus noctula 

Noctule bat 

3 

17 


Scotomanes omatus 

Harlequin bat 

1 

1 


Barbastella leucomelas 

Eastern barbastelle 

1 

1 


Tylonycteris pachypus 

Lesser bamboo bat 

1(1) 

14(2) 

5 

la io 

Great evening bat 

1 

8 


Murina leucogaster 

Greater tube-nosed bat 

1 

5 


Minioptenis schreibersi 

Schreiber’s long-fingered bat 

15 (3) 

135 (17) 

1 

Pteropodidae 

Cynoptems sphinx 

Greater short-nosed fruit bat 

2 

6 


Rousettus leschenaulti 

Leschenault’s Rousette 

1 

31 


Total 

35 

82 (19) 

985 (64) 



“ Identification of many Myotis specimens was possible only to generic level. 


swabs, suggesting that the gastrointestinal tract is the principal 
replication site of coronavirus infection in those bats. 

There were also some species of bats that had high sample 
numbers, but in which all individuals were negative for coro¬ 
navirus: 84 individuals of the genus Hipposideros (58 from 
Hipposideros aimiger), 101 specimens of Rhinolophus pusillus, 
and 37 samples from two genera of the Pteropodidae. 

To determine the overall diversity of coronaviruses that 
were isolated from bats, preliminary phylogenetic analysis of 
the RdRp fragment obtained from RT-PCR detection re¬ 
vealed that all viruses characterized fell within the previously 
recognized coronavirus groups, including the SARS-CoV 
group. Of the 65 viruses, only three bat coronaviruses were 
closely related to SARS-CoV (putative group 4) and 40 clus¬ 
tered with group 1 viruses, while the remaining 22 viruses form 
a separate group that is most closely related to group 2 viruses 
(putative group 5); however, there was no statistical support 


for this relationship (Fig. 2). None of the coronaviruses char¬ 
acterized in this study were phylogenetically related to group 3. 

Genetic analysis revealed the presence of species-specific 
host restriction of coronavirus in bats. For all species, but one, 
that were sampled and found to harbor coronavirus, those 
viruses from a single species all clustered together with high 
bootstrap support (Fig. 2). The one exception was R. ferrume- 
quinum, which tested positive for group 1, 4, and 5 viruses. 
Furthermore, in instances where the same bat species was 
sampled in different provinces, those species were found to 
harbor coronaviruses that clustered together (Fig. 2). Species 
specificity was also evident when two bat species from the same 
cave in Guangxi, Minioptenis schreibersi and Myotis ricketti, were 
positive for group 1 coronaviruses, represented by BtCoV/ 
A911/05 and BtCoV/821/05, respectively, but the viruses from 
each species did not cluster together in the phylogenetic anal¬ 
ysis (Fig. 2). 
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FIG. 2. Phylogenetic relationships of 64 coronaviruses isolated from bats in China. The tree was generated based on 440 nucleotides of the 
RNA-dependent RNA polymerase region by the neighbor-joining method in the MEGA program. Numbers above branches indicate neighbor¬ 
joining bootstrap values (percent) calculated from 1,000 bootstrap replicates. Terminal nodes containing bat coronaviruses isolated in this study 
are collapsed and represented by a blue triangle with the number of viruses indicated within. The tree was rooted to Breda virus (AY427798). Scale 
bar, 0.05 substitution per site. Red text indicates provinces from where viruses were isolated. Abbreviations: AH, Anhui; FJ, Fujian; GD, 
Guangdong; GX, Guangxi; HA, Hainan; HB, Hubei; HE, Henan; JX, Jiangxi; SC, Sichuan; SD, Shandong; YN, Yunnan. 


These findings suggest that genetically divergent coronavi- 
ruses are commonly present in, and specific to, different spe¬ 
cies of bats in China. 

Genome organization. Based on preliminary phylogenetic 
analysis of the RdRp gene (Fig. 2), four strains, representing 
the diversity of bat coronaviruses isolated in this study, were 
selected for full genome sequencing: BtCoV/Tylonycteris 
pachypus/ Guangdong/133/2005 (BtCoV/133/05), BtCo V/Rhi- 
nolophus ferrumequinum/}ivibeil213l200i (BtCoV/273/04), 
BtCoV/R. macrotis/H ubei/279/2004 (BtCoV/279/04), and 
BtCoV /Scotophilus A:u/i/«/Hainan/512/2005 (BtCoV/512/05). 
An additional five viruses were selected for partial sequencing 
of the RdRp, HEL, and S genes: BtCoV/.S’. kuhlii/Haman/515/ 


2005 (BtCoV/515/05), BtCoV/S. kuhliil Hainan/527/2005 (BtCoV/ 
527/05), BtCoV /Pipistrellus pipistrellus /Hainan/434/05 (BtCoV/ 
434/05), BtCoV/P. ohrawMx/Sichuan/355/2005 (BtCoV/355/05), 
and BtCoV/Myotis ncte(//Yunnan/701/2005 (BtCoV/701/05). Se¬ 
quences generated in this study were analyzed with all available 
coronavirus sequence data in public databases. Comparison of 
the genome organization of bat coronaviruses with that of 
representative strains of other coronavirus is presented in Fig. 
3 and Table 2. 

All four bat coronaviruses had classic coronavirus genome 
organization in which the replicase gene and structural protein 
genes are arranged in the order 5'-ORFla and ORFlb, S, E, 
M, and N (Fig. 3). The genome size of these bat coronaviruses 
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FIG. 3. Linear representation of the ORFs of the bat coronaviruses and representative known coronaviruses from each group. Conserved 
functional domains in ORFla and ORFlb are indicated by yellow boxes. The following predicted domains are shown: pepain-like proteases 1 and 
2 (PL1 and PL2), 3C-like protease (3CL), RdRp, metal ion-binding domain (MB), and helicase (Hel). Putative ORFs are indicated by blue boxes 
and numbered according to their order in the genome: BtCo V/R. femimequinwn/'H\ibei/213l04 (BtCoV/273/04), BtCoV/R. mncroto/Hubei/279/04 
(BtCoV/279/04), BtCoV/T. pac/iypus/Guangdong/l 33/05 (BtCoV/133/05), BtCoV/S. kuhlii/Haman/5 12/05 (BtCoV/512/05), SARS-CoV, PEDV, 
avian IBV, and human coronavirus OC43 (HCoV-OC43). 


varied: the longest was 30.3 kb, for BtCoV/133/05, and the 
shortest was 28.2 kb, for BtCoV/512/05. 

Putative ORFs coding for nonstructural proteins or acces¬ 
sory proteins were deduced and analyzed if transcription-reg¬ 
ulating sequences (TRSs) were present close to, and upstream 
of, potential initiating methionine residues. The ORFs of non¬ 
structural proteins vary significantly among different bat coro¬ 
naviruses. The genome organization of BtCoV/273/04 and that 
of BtCoV/279/04 were essentially the same and were similar to 
that of SARS-CoV. The genome organization of BtCoV/ 
512/05 is most similar to that of porcine epidemic diarrhea 
virus (PEDV), while the genome of BtCoV/133/05 is unlike 
that of all known coronaviruses (Fig. 3). 

In the genome of all coronaviruses, approximately the first 
two-thirds of the genome is composed of the two large repli- 
case ORFs ORFla and ORFlb, which encode virus replicase 
polyproteins ppla and pplab (14). Proteolytic processing end 
products and putative functional domains of the replicase 
polyproteins were identified. The nonstructural proteins nspl 
and nsp2 were the most variable among these bat coronavi¬ 
ruses, while papain-like protease (PL), 3C-like protease (3CL), 
RdRp, metal binding (MB), and HEL functional domains were 


conserved in all genomes, except that of BtCoV/133/05 (Fig. 3). 
Coronaviruses generally employ two papain-like proteases, 
PL1 and PL2, to process the N-proximal regions of the repli¬ 
cative polyproteins. PL1 and PL2 were identified in BtCoV/ 
512/05; however, only one PL domain was identified in BtCoV/ 
273/04 and BtCoV/279/04. It is noteworthy that in BtCoV/ 
133/05 both nspl and nsp2 were highly divergent from other 
coronaviruses and that the PL domain could not be identified 
in any of the nonstructural proteins (Fig. 3; Table 2). 

ORFs located between the S and E genes and between the 
M and N genes were predicted and are numbered according to 
their order in the genome (Fig. 3; Table 2). In viruses BtCoV/ 
273/04, BtCoV/279/04, and BtCoV/512/05, there is a single 
ORF between the S and E genes (ORF3). In BtCoV/273/04 
and BtCoV/279/04 ORF3 is predicted to encode a similar pro¬ 
tein of 274 amino acids (aa) with two predicted TM helices in 
the N-terminal sequence. BLAST and Pfam searches failed to 
identify any sequences similar to this protein. In BtCoV/512/05 
ORF3 encodes a predicted 224-aa protein also with two pre¬ 
dicted TM domains in the N-terminal sequence. 

The region between the S and E genes in BtCoV/133/05 is 
the longest among all known coronaviruses, at 2,013 bp (Fig. 


Downloaded from http://jvi.asm.org/ on March 9, 2015 by UCSF Library & CKM 





















































































































































































































7486 


TANG ET AL. 


J. Virol. 


TABLE 2. Comparison of coronavirus genome structures 0 


BtCoV/273/04 (29,704 6 ) BtCoV/279/04 (29,741) SARS-CoV (29,751) IBV (27,608) 

Feature - - - - 



Start (bp) 

End (bp) 

No. of aa 

Start (bp) 

End (bp) 

No. of aa 

Start (bp) 

End (bp) 

No. of aa 

Start (bp) 

End (bp) 

No. of aa 

ORFla 

261 

13,382 

4,373 

897 

13,415 

4,173 

265 

13,398 

4,378 

529 

12,354 

3,942 

nspl 

261 

797 

179 

897 

1,157 

87 

265 

801 

179 

529 

753 

75 

nsp2 

798 

2,717 

640 

1,158 

2,717 

520 

802 

2,718 

639 

754 

2,547 

598 

nsp3 (PL pro) 

2,718 

8,468 

1,917 

2,718 

8,501 

1,928 

2,719 

8,484 

1.922 

2,548 

7,323 

1,592 

nsp5 (3CL) 

9,969 

10,886 

306 

10,002 

10,919 

306 

9,985 

10,902 

306 

8,866 

9,786 

307 

ORFlb 

13,382 

21,469 

2,695 

13,415 

21,502 

2,695 

13,398 

21,485 

2,695 

12,354 

20,417 

2,687 

nspl2 (RdRp) 

13,356 

16,150 

932 

13,389 

16,183 

932 

13,372 

16,166 

932 

12,313 

15,131 

940 

nspl3 (HEL) 
Ns2 

16,151 

17,953 

601 

16,184 

17,986 

601 

16,167 

17,969 

601 

15,132 

16,931 

600 

HE 













S 

21,476 

25,201 

1,241 

21,509 

25,234 

1,241 

21,492 

25,259 

1,255 

20,368 

23,856 

1,162 

ORF3/3a 

25,211 

26,035 

274 

25,244 

26,068 

274 

25,268 

26,092 

274 

23,856 

24,029 

57 

ORF3b 

ORF3c 










24,029 

24,223 

64 

E 

26,060 

26,290 

76 

26,093 

26,323 

76 

26,117 

26,347 

76 

24,207 

24,533 

108 

M 

26,337 

27,002 

221 

26,374 

27,039 

221 

26,398 

27,063 

221 

24,505 

25,182 

225 

ORF6 

27,013 

27,204 

63 

27,050 

27,241 

63 

27,074 

27,265 

63 

25,488 

25,685 

65 

ORF7 

27,212 

27,580 

122 

27,249 

27,617 

122 

27,273 

27,641 

122 

25,682 

25,930 

82 

ORF8 

27,718 

28,086 

122 

27,755 

28,120 

122 

27,864 

28,118 

84 




N 

28,088 

29,353 

421 

28,135 

29,397 

420 

28,120 

29,388 

422 

25,873 

27,102 

409 

ORF10 

s2m 

29,555 

29,586 


29,599 

29,630 


29,590 

29,621 


27,477 

27,508 



° For blank cells the corresponding ORF was either not present or not identified. 
b Numbers in parentheses after the virus names are genome sizes in base pairs. 


3). Furthermore, in BtCoV/133/05 this region contains three 
predicted ORFs (ORF3a, ORF3b, and ORF3c), with pre¬ 
dicted proteins of 91, 285, and 227 aa, respectively. Each of 
these ORFs has a conserved TRS upstream of the ORFs: 
UUAACGAACUU (9 nucleotides) AUG for OFR3a and UU 
AACGAACUU AUG for ORF3b and ORF3c. The ORF3c- 
encoded protein contains three TM domains, but no matching 
proteins could be identified. 

In BtCoV/273/04 and BtCoV/279/04 the region between the 
M and N genes is a 1,085- and a 1,095-bp sequence, respec¬ 
tively, that contains three ORFs (ORF6, ORF7, and ORF8) of 
63, 122, and 122 aa, respectively (Fig. 3). ORF7 is predicted to 
have two TM domains, in both the N- and C-terminal se¬ 
quences, while for ORF8 one TM helix is predicted. BLAST 
and Pfam searches failed to identify sequences similar to any of 
the three predicted proteins. This region between the M and N 
genes is absent in BtCoV/133/05 and BtCoV/512/05 (Fig. 3). 
The sequence region between the M and N genes of BtCoV/ 
273/04 and BtCoV/279/04 and other SARS-like Co Vs showed 
a gene organization similar to that of IBV (22, 46). Analysis 
of this region in a representative IBV (NC_001451) revealed 
a much shorter region (692 bp) also with two ORFs (ORF6 
and ORF7) predicted to encode proteins of 65 and 82 aa, 
respectively. However, unlike BtCoV/273/04 and BtCoV/ 
279/04, in IBV no conserved TRSs were identified upstream 
of the three ORFs. 

Downstream of the N gene in BtCoV/512/05, there is a 
387-bp sequence (ORF10) that is predicted to encode a 129-aa 
protein with a putative signal peptide at the N-terminal region 
and three TM domains. This sequence region is absent in all 
known coronaviruses including BtCoV/133/05, BtCoV/273/04, 
and BtCoV/279/04 (Fig. 3). No matching protein was identified 
in GenBank or Pfam. 


The hemagglutinin esterase protein, which is present in 
group 2 coronaviruses (6) and presumably obtained by hori¬ 
zontal gene transfer from influenza C virus (48), was not 
present in any of the bat coronaviruses analyzed in this study. 
In the 3' untranslated region a stem-loop Il-like (s2m) motif 
(15) was recognized in BtCoV/273/04 and BtCoV/279/04 but 
not in BtCoV/133/05 and BtCoV/512/05 (Fig. 3). This motif is 
also present in group 3 coronaviruses and SARS-CoV but not 
in other coronaviruses (34, 37). 

Sequence similarity. To understand the interrelationship be¬ 
tween the BtCoVs and the other known coronaviruses, simi¬ 
larity analysis within and between groups was conducted (9). 
Analysis of the RdRp amino acid sequence showed that, within 
groups, the similarity ranged from 82 to 99%, while between 
different groups, including the putative groups 4 and 5 in the 
present study, the similarity range was 60 to 74% (Fig. 4A). In 
contrast, within-group similarities of the S protein were from 
59 to 91% and between-group similarities were from 22 to 36% 
(Fig. 4B). Similar patterns were observed for the remaining 
major gene products: more-conserved genes usually had higher 
similarity between different groups, and less-conserved genes 
had lower similarity between groups (data not shown). 

Phylogenetic analysis. To further define the evolutionary 
pathway of those novel BtCoVs, each of the major genes was 
phylogenetically analyzed. In all genes analyzed, represented 
by the HEL and S gene trees, the bat coronaviruses did not 
form a single group (Fig. 5). As in the preliminary analysis, five 
groups, all with 100% bootstrap support, were apparent (Fig. 2 
and 5). The same relationships were apparent in all genes 
analyzed, with the exception of group 1 bat CoVs (BtCoV/512/ 
05, BtCoV/515/05, and BtCoV/527/05) and putative group 5 
viruses (represented by BtCoV/133/05). 

In the HEL, N, and E gene phylogenies, putative group 5 
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TABLE 2 —Continued 


BtCoV/133/05 (30,307) HCoV-OC43 (30,738) BtCoV/512/05 (28,204) PEDV (28,033) 


Start (bp) 

End (bp) 

No. of aa 

Start (bp) 

End (bp) 

No. of aa 

Start (bp) 

End (bp) 

No. of aa 

Start (bp) 

End (bp) 

No. of aa 

260 

13,564 

4,435 

211 

13,341 

4,377 

294 

12,650 

4,119 

297 

12,620 

4,108 

260 

2,800 

847 

211 

948 

246 

294 

1,514 

407 

297 

626 

110 

2,801 

4,420 

540 

949 

2,763 

605 

1,515 

2,984 

490 

627 

2,918 

764 

4,421 

8,632 

1,404 

2,764 

8,460 

1,899 

2,985 

7,886 

1,634 

2,982 

7,847 

1,622 

10,154 

11,071 

306 

9,949 

10,857 

303 

9,330 

10,235 

302 

9,288 

10,193 

302 

13,564 

21,639 

2,691 

13,341 

21,497 

2,718 

12,650 

20,674 

2,674 

12,620 

20,641 

2,673 

13,541 

16,341 

934 

13,318 

16,100 

928 

12,627 

15,406 

927 

12,597 

15,376 

927 

16,342 

18,135 

598 

16,101 

17,909 

603 

15,407 

17,197 

597 

15,377 

17,167 

597 




21,507 

22,343 

279 










22,355 

23,629 

425 







21,584 

25,636 

1,350 

23,644 

27,729 

1,361 

20,671 

24,786 

1,371 

20,638 

24,789 

1,383 

25,663 

25,938 

91 

27,817 

28,146 

109 

24,786 

25,460 

224 

24,789 

25,463 

224 

26,119 

26,976 

285 










26,992 

27,675 

227 










27,745 

27,993 

82 

28,133 

28,387 

84 

25,441 

25,671 

76 

25,444 

25,674 

76 

28,008 

28,667 

219 

28,402 

29,094 

230 

25,678 

26,361 

227 

25,682 

26,362 

226 


28,705 

29,979 

424 

29,104 

30,450 

448 

26,372 

27,556 

394 

26,374 

27,699 

441 







27,571 

27,960 

129 





viruses fall as the sister group to the SARS and SARS-like 
CoV group (putative group 4), which also contains two bat 
coronaviruses from this study (BtCoV/273/04 and BtCoV/279/ 
04). However, in the S, M, and RdRp gene analyses, group 5 
viruses are most closely related to group 2 coronaviruses. 


A 




FIG. 4. Similarity histogram of RdRp (A) and spike (B) genes 
based on alignments from the program TransAlign. 


In all genes analyzed, except the S gene, group 1 bat coro¬ 
naviruses are most closely related to PEDV (bootstrap sup¬ 
port, 99%), and these viruses cluster with HCoV-NL63 and 
HCoV-229E (Fig. 5A). In the S gene tree, while group 1 bat 
coronaviruses still clustered together with PEDV, they were 
now most closely related to those coronaviruses from domestic 
animals (Fig. 5B). The relationship of group 1 bat coronavi¬ 
ruses to PEDV, transmissible gastroenteritis virus, and feline 
coronaviruses demonstrates that virus transmission may occur 
between bats, livestock, and companion animals, presenting a 
possible pathway for human infection. 

None of the viruses sequenced in this study was the direct 
progenitor of SARS. It is noteworthy that within putative 
group 4 the SARS-like viruses from bats clustered together, 
away from SARS viruses from other mammalian hosts (Fig. 5), 
suggesting that other intermediate hosts or viruses were in¬ 
volved in the emergence of SARS. 

Taken together, the above phylogenetic findings demon¬ 
strated that bats had a relatively high diversity of coronaviruses 
and harbor a distinct lineage (putative group 5) that may rep¬ 
resent a novel coronavirus group. These relationships are in 
consensus with the results of the genomic and sequence simi¬ 
larity analyses. 

Recombination analysis. To evaluate if the different gene 
phylogenies for group 1 bat CoVs and putative group 5 viruses 
were due to recombination, a sliding window analysis was con¬ 
ducted. Results of this analysis indicated that while some areas 
of the RdRp, S, and N genes may be recombinant, there was no 
statistical support for this conclusion. Furthermore, those po¬ 
tentially recombinant areas were highly divergent and ambig¬ 
uously aligned, and the different phylogenies were therefore 
likely due to variation in the rates of substitution and not 
recombination between coronaviruses (13, 30). 
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FIG. 5. Phylogenetic relationships of the helicase (A) and spike (B) genes of representative coronaviruses isolated from bats in China. Trees 
were generated by the neighbor-joining method in the PAUP program. Numbers above branches indicate neighbor-joining bootstrap values 
(percent) calculated from 1,000 bootstrap replicates. Analyses were based on 1,833 nucleotides for the helicase gene and 3,510 nucleotides for the 
spike gene. The trees were rooted to Breda virus (AY427798). Scale bar, 0.1 substitution per site. 


DISCUSSION 

The recent identification of SARS-like and other coronavi¬ 
ruses in bats suggested that they may play an important role in 
the ecology of these viruses. In the present study we investi¬ 
gated 35 of the 120 bat species identified in China (44) and 
revealed that approximately 6% of bats, from 10 different spe¬ 
cies sampled in 12 provinces, were positive for coronavirus. 
Our findings indicate that bats with coronavirus infection are 
commonly observed in this region. 

Phylogenetic analyses of the present study revealed high 
genetic diversity of coronaviruses in bats from this region. 
Except for SARS-like viruses, many bat coronaviruses clus¬ 
tered with existing group 1 viruses; while others formed a 
separate lineage that included only viruses from bats (putative 
group 5). Within group 1, the bat CoVs did not form a single 
group but were highly divergent and related to coronaviruses 
previously identified from different domestic animals. 

Our findings also revealed that within the SARS and SARS- 
like CoV group (putative group 4) the S gene and other genes 
clustered into two subgroups, one of bat CoVs and another of 
SARS viruses from humans and other mammalian hosts. As 
the similarity of the S genes between those two subgroups is 


only approximately 80% and since coronaviruses usually have 
low mutation rates (24), it seems unlikely that these viruses 
have diverged due to host adaptation within such a short time 
period. Therefore, the direct progenitor of the SARS-CoV 
from civets in the animal markets of southern China and the 
ecological and evolutionary pathway that led to the emergence 
of SARS have still not been fully determined. 

The association between almost all of the coronaviruses that 
we sequenced and a single bat species demonstrates a high 
degree of host restriction for coronavirus in bat populations. 
For example, similar viruses were detected in Myotis ricketti 
from Anhui, Guangdong, and Yunnan, approximately 1,600 
km distant, while two different bat species sampled in the same 
cave had different coronaviruses. This wide distribution may be 
associated with bat migration. It also appears that SARS-like 
CoVs from bats are restricted to different species of Rhinolo- 
phus. Furthermore, Hipposideros, which belongs to the same 
family as Rhinolophus, and all members of the Pteropodidae all 
tested negative for coronavirus, even though many individuals 
were sampled (Table 1). As such, these viruses may be re¬ 
stricted to just a few families and genera, and further informa¬ 
tion regarding which taxonomic groups of bats may host coro- 
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naviruses will provide an insight into the evolution and ecology 
of coronaviruses. 

While there have been previous reports of recombination in 
coronaviruses as a major evolution pattern (18, 19), it is likely 
that at least some of this is due to those sequence areas being 
highly divergent. This study did not find any convincing evi¬ 
dence for recombination events in the bat coronaviruses 
tested. This information further supports the high degree of 
host specificity seen for bat coronaviruses, as two divergent 
viruses are unlikely to coinfect the same bat species, let alone 
the same individual. However, it must be noted that there was 
one instance of a single bat species being infected with coro¬ 
naviruses from two different groups (Fig. 2; Table 1). 

It is also possible that coronavirus may cause a persistent or 
long-term infection in bat species as observed for other coro¬ 
naviruses in vivo and in vitro (5, 36). Each of the previous 
studies that have identified coronaviruses in bats has sampled 
at various times and in different areas of China, and all have 
successfully identified coronaviruses from the samples (23, 25, 
32). In addition, the present study was conducted over 17 
months in provinces throughout China, and positive samples 
were identified almost year-round. 

In the present study, all bat coronaviruses tested had classi¬ 
cal coronavirus genome organization (4). However, BtCoV/ 
133/05 from putative group 5 had the longest genome charac¬ 
terized from bats, a large noncoding region at the start of the 
genome in which we were unable to identify the PL domain, 
and also three ORFs between the S and E genes. 

The continued identification of novel coronaviruses from 
different hosts, especially bats, suggests that coronaviruses are 
more diverse than previously thought (14). Therefore, the clas¬ 
sification of the group may need to be modified to match this 
increasing diversity. The results of this study suggest that many 
novel coronaviruses cannot be easily accommodated in the 
current classification, as antigenic data are not available in 
many cases due to difficulty in virus isolation (14, 23, 25, 32). 
Genetic data also indicate that some of these novel coronavi¬ 
ruses are intermediate strains that fall between the established 
groups. Therefore, based on phylogenetic relationships, low 
genetic similarity, and unique genome organization we pro¬ 
pose a new putative coronavirus group (group 5) and also 
support the suggestion that SARS-like coronaviruses belong to 
group 4 (27). The proliferation of coronaviruses identified 
from different hosts has also led to confusion in naming the 
viruses. We have therefore used a standardized naming system 
based on the influenza A virus convention. While any changes 
in nomenclature and taxonomy must be arrived at through 
consensus in the scientific community, we believe that it is 
reasonable to consider these issues. 

In considering the diversity of species and the habitats that 
they occupy, large population sizes and densities, and the abil¬ 
ity to migrate, bats appear to be ideal candidates for the nat¬ 
ural reservoirs of all coronaviruses (7). The current study re¬ 
vealed that coronaviruses in bats exhibit high genetic diversity 
and high prevalence across a wide geographical distribution, 
possibly with asymptomatic or persistent infection. However, 
as bats are a large order that account for approximately 20% of 
extant mammalian species (1), so far only a small proportion of 
the total species number have been investigated and those only 
from China (23, 25, 32). There is also a general lack of knowl¬ 


edge regarding the prevalence of coronaviruses in other animal 
groups, and it will be difficult to reach solid conclusions until 
more is known regarding the frequency and diversity of coro¬ 
naviruses in other animals, especially those that share ecolog¬ 
ical space with bats. 
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